Search CORE

7 research outputs found

Fast Context Adaptation via Meta-Learning

Author: Hofmann Katja
Kurin Vitaly
Shiarlis Kyriacos
Whiteson Shimon
Zintgraf Luisa M
Publication venue
Publication date: 01/01/2019
Field of study

We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable. CAVIA partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. At test time, only the context parameters are updated, leading to a low-dimensional task representation. We show empirically that CAVIA outperforms MAML for regression, classification, and reinforcement learning. Our experiments also highlight weaknesses in current benchmarks, in that the amount of adaptation needed in some cases is small.Comment: Published at the International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

Oxford University Research Archive

TACO: Learning Task Decomposition via Temporal Alignment for Control

Author: Posner Ingmar
Salter Sasha
Shiarlis Kyriacos
Whiteson Shimon
Wulfmeier Markus
Publication venue
Publication date: 01/01/2018
Field of study

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, they provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.Comment: 12 Pages. Published at ICML 201

arXiv.org e-Print Archive

Oxford University Research Archive

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Author: Gal Yarin
Hofmann Katja
Igl Maximilian
Schulze Sebastian
Shiarlis Kyriacos
Whiteson Shimon
Zintgraf Luisa
Publication venue
Publication date: 01/01/2020
Field of study

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.Comment: Published at ICLR 202

arXiv.org e-Print Archive

Oxford University Research Archive

Hierarchical Imitation Learning for Stochastic Environments

Author: Gupta Tarun
Igl Maximilian
Mougin Paul
Shah Punit
Shiarlis Kyriacos
Srinivasan Sirish
White Brandyn
Whiteson Shimon
Publication venue
Publication date: 25/09/2023
Field of study

Many applications of imitation learning require the agent to generate the full distribution of behaviour observed in the training data. For example, to evaluate the safety of autonomous vehicles in simulation, accurate and diverse behaviour models of other road users are paramount. Existing methods that improve this distributional realism typically rely on hierarchical policies. These condition the policy on types such as goals or personas that give rise to multi-modal behaviour. However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i.e., those under the agent's control, are encoded in the type. Encoding future information about external factors leads to inappropriate agent reactions during testing, when the future is unknown and types must be drawn independently from the actual future. We formalize this challenge as distribution shift in the conditional distribution of agent types under environmental stochasticity. We propose Robust Type Conditioning (RTC), which eliminates this shift with adversarial training under randomly sampled types. Experiments on two domains, including the large-scale Waymo Open Motion Dataset, show improved distributional realism while maintaining or improving task performance compared to state-of-the-art baselines.Comment: Published at IROS'2

arXiv.org e-Print Archive

Learning from Demonstration in the Wild

Author: Behbahani Feryal
Chen Xi
Gomes Joao
IEEE
Kasewa Sudhanshu
Kurin Vitaly
Messias Joao
Oliehoek Frans A
Paul Supratik
Shiarlis Kyriacos
Stirbu Ciprian
Whiteson Shimon
Publication venue
Publication date: 01/01/2019
Field of study

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose Video to Behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.Comment: Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2019; extended version with appendi

arXiv.org e-Print Archive

University of Liverpool Repository

Oxford University Research Archive

Learning from demonstration in the wild

Author: Behbahani Feryal (author)
Chen Xi (author)
Gomes Joao (author)
Kasewa Sudhanshu (author)
Kurin Vitaly (author)
Oliehoek F.A. (author)
Paul Supratik (author)
Shiarlis Kyriacos (author)
Stirbu Ciprian (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical. It has succeeded in a wide range of problems but typically relies on manually generated demonstrations or specially deployed sensors and has not generally been able to leverage the copious demonstrations available in the wild: those that capture behaviours that were occurring anyway using sensors that were already deployed for another purpose, e.g., traffic camera footage capturing demonstrations of natural behaviour of vehicles, cyclists, and pedestrians. We propose video to behaviour (ViBe), a new approach to learn models of behaviour from unlabelled raw video data of a traffic scene collected from a single, monocular, initially uncalibrated camera with ordinary resolution. Our approach calibrates the camera, detects relevant objects, tracks them through time, and uses the resulting trajectories to perform LfD, yielding models of naturalistic behaviour. We apply ViBe to raw videos of a traffic intersection and show that it can learn purely from videos, without additional expert knowledge.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

TU Delft Repository

Transfer Learning for Multiagent Reinforcement Learning Systems

Author: Abadi Martín
Abel David
Agarwal Akshat
Albrecht Stefano V.
Alexander
Amir Ofra
Argall Brenna D.
Argente Estefania
Badue Claudine
Banerjee Bikramjit
Banerjee Bikramjit
Barrett Samuel
Barto Andrew G.
Bazzan Ana L. C.
Behboudian Paniz
Bengio Yoshua
Berner Christopher
Bianchi Reinaldo
Bianchi Reinaldo A. C.
Bianchi Reinaldo A. C.
Bignold Adam
Bogg Paul
Boutsioukis Georgios
Bradley Knox W.
Braylan Alexander
Brys Tim
Brys Tim
Busoniu Lucian
Capobianco Roberto
Castaneda Alvaro Ovalle
Cederborg Thomas
Chernova Sonia
Chernova Sonia
Chernova Sonia
Cobo Luis C.
Croonenborghs Tom
Cui Yuchen
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Da Silva Felipe Leno
Danilo
de Cote Enrique Munoz
De Hauwere Y-M.
de la Cruz Gabriel V.
Devailly François-Xavier
Devin Coline
Devlin Sam
Devlin Sam
Didi Sabre
Dietterich Thomas G.
Diuk Carlos
Du Yunshu
Dusparic Ivana
Fang Zhou
Fernández Fernando
Fitzgerald Tesca
Florensa Carlos
Floyd Michael W.
Foerster Jakob
Foerster Jakob N.
Foerster Jakob N.
Freire Valdinei
Glatt Ruben
Goldberg David E.
Goodfellow Ian J.
Griffith Shane
Gupta Abhishek
Gupta Jayesh K.
Hanna Josiah
Hausknecht Matthew
Hausknecht Matthew
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hernandez-Leal Pablo
Hou Yaqing
Hu Junling
Hu Yujing
Hu Yujing
Hu Yujing
Ilhan Ercüment
Isele David
Jordan Scott M.
Judah Kshitij
Judah Kshitij
Judah Kshitij
Kelly Stephen
Kersting Kristian
Kim Dong-Ki
Kitano Hiroaki
Kober Jens
Koga M. L.
Koga Marcelo Li
Kolter J. Zico
Konidaris George
Kono Hitoshi
Krening Samantha
Lai Kwei-Herng
Lauer Martin
Le Hoang Minh
Leibo Joel Z.
Li Lihong
Liang Eric
Lin Xiaomin
Littman Michael L.
Littman Michael L.
Lopes Manuel
Lowe Ryan
Lyu Xueguang
MacGlashan James
MacGlashan James
Maclin Richard
Madden Michael G.
Mandel Travis
Martin
Matiisen Tambet
Matthew
MDP
Melo Francisco S.
Mnih Volodymyr
Narvekar Sanmit
Narvekar Sanmit
Narvekar Sanmit
Natarajan Sriraam
Ng Andrew Y.
Nguyen Thanh Thi
Omidshafiei Shayegan
Omidshafiei Shayegan
Pan Sinno J.
Panait Liviu
Paszke Adam
Peng Bei
Peng Bei
Pinto Lerrel
Poole David L.
Price Bob
Price Bob
Proper Scott
Puterman Martin L.
Ramachandran Deepak
Ramakrishnan Ramya
Reddy Tummalapalli Sudhamsh
Rosenfeld Ariel
Ryu Heechang
Sakato Tatsuya
Schaal Stefan
Schulman John
Schulman John
Shiarlis Kyriacos
Shoham Yoav
Shon Aaron P.
Shortreed Susan M.
Silver David
Sinapov Jivko
Sodomka Eric
Souza Lucas Oliveira
Stanley Kenneth O.
Stone Peter
Stone Peter
Stone Peter
Suay Halit Bener
Subramanian Kaushik
Subramanian Sriram Ganapathi
Sukhbaatar Sainbayar
Sukhbaatar Sainbayar
Sutton Richard S.
Svetlik Maxwell
Tamassia Marco
Tan Ming
Tangkaratt Voot
Tangkaratt Voot
Tanner Brian
Taylor Adam
Taylor Adam
Taylor Matthew E.
Taylor Matthew E.
Taylor Matthew E.
Tesauro Gerald
Thrun Sebastian
Todorov Emanuel
Torabi Faraz
Torabi Faraz
Torrey Lisa
Vamplew Peter
Vinyals Oriol
Vrancx Peter
Walsh Thomas J.
Wang Zhaodong
Watkins Christopher J.
Wiewiora Eric
Wooldridge Michael J.
Xiong Yanhai
Yang Tianpei
Yang Yaodong
Zhan Yusen
Zhifei Shao
Zhou L.
Zhou Ming
Zhu Changxi
Zimmer Matthieu
Publication venue: 'Morgan & Claypool Publishers LLC'
Publication date
Field of study

Crossref